Skip to content

Conversation

@schlunma
Copy link
Contributor

@schlunma schlunma commented Nov 29, 2022

Description

This PR adds a function that updates the product attributes (and thus the metadata.yml files) with cube metadata before saving files. This ensures that the entries in metadata.yml match the cube metadata.

Currently modified metadata:

  • standard_name from cube.standard_name
  • long_name from cube.long_name
  • var_name from cube.short_name
  • units from cube.units
  • frequency from cube.attributes['frequency']

Closes #505

Documentation: https://esmvaltool--1837.org.readthedocs.build/projects/ESMValCore/en/1837/develop/preprocessor_function.html#metadata


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@schlunma schlunma added the bug Something isn't working label Nov 29, 2022
@schlunma schlunma added this to the v2.8.0 milestone Nov 29, 2022
@schlunma schlunma self-assigned this Nov 29, 2022
@schlunma schlunma marked this pull request as draft November 29, 2022 14:53
@codecov
Copy link

codecov bot commented Nov 29, 2022

Codecov Report

Merging #1837 (730071f) into main (5180b4f) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #1837   +/-   ##
=======================================
  Coverage   91.54%   91.54%           
=======================================
  Files         203      203           
  Lines       10945    10945           
=======================================
  Hits        10020    10020           
  Misses        925      925           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@schlunma
Copy link
Contributor Author

@bouweandela this is possible solution to #505.

In your comment you mentioned also adapting frequencies accordingly. I thought about this some time now and found several possible solutions for this:

  1. Use a similar function than the one used in the CMOR checks. This one checks all time points and might be very expensive, especially if we do not make any assumptions about the cube's frequency and let the code check everything.
  2. Only check the first two points. This is much cheaper, but might lead to wrong results.
  3. Somehow couple this to the preprocessors used. E.g., if someone uses annual_statistics, we know that the data is yearly afterwards. This might not be straightforward to implement and might get confusing if multiple of the units-changing preprocessors are used.

In addition, I have some questions:

  • How do we deal with data that does not have a time dimensions anymore (e.g., after running climate_statistics with period=full). Use fx? Use None?
  • How do we deal with climatologies (e.g., after running climate_statistics with period=monthly? For monthly climatologies, do we simply use mon? I am not entirly sure as this is different to a standard monthly time series.

All these problems lead me to the conclusion that it might be best to tackle the frequency update at a later stage. This PR solves an actual existing problem, whereas no one complained about wrong frequencies before. So maybe just open an issue about this and merge this without adapting the frequencies?

@ESMValGroup/esmvaltool-coreteam any opinions?

@bouweandela
Copy link
Member

My idea wasn't nearly as advanced as that: i thought we could just update the frequency in metadata.yml using the value of the frequency or time_frequency attribute on the cube.

Prepeocessors who change the frequency or any of the other attributes should then take care of updating the appropriate cube attributes.

@schlunma
Copy link
Contributor Author

Right - that's much easier indeed. Two questions on this:

  1. Should I update all time preprocessors with the frequency attribute in this PR? If yes, I would need an answer to my questions above.
  2. Are you proposing to update the product attributes from all cube attributes? I guess this can be a little bit dangerous, for example, if a dataset uses a key that we also use in the raw data (e.g., project) then we might accidentally incorrectly overwrite the product attributes.

@bouweandela
Copy link
Member

Should I update all time preprocessors with the frequency attribute in this PR? If yes, I would need an answer to my questions above.

I would leave that for a new pull request.

Are you proposing to update the product attributes from all cube attributes? I guess this can be a little bit dangerous, for example, if a dataset uses a key that we also use in the raw data (e.g., project) then we might accidentally incorrectly overwrite the product attributes.

No, just the ones that are wrong after preprocessing.

@schlunma
Copy link
Contributor Author

All right, implemented.

One remaining question: Currently, if a cube property (e.g., cube.standard_name) is None, the metadata is update with an empty string. This is in line with our current procedure in the CMOR checks.

However, in the light of preprocessors that completely remove cube metadata (#651), this might lead to unexpected results. Would it be safer to just not update the corresponding values if it's None?

@schlunma
Copy link
Contributor Author

Corresponding frequency issue: #1840

@schlunma schlunma marked this pull request as ready for review November 30, 2022 09:06
@schlunma schlunma requested a review from bouweandela November 30, 2022 09:06
Copy link
Member

@bouweandela bouweandela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @schlunma! I tested this with the convert_units preprocessor function and that works fine now.

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cheers guys - am the merger, so I am not requesting any changes, but I have a couple questions before I merge this piece of art 😁

@valeriupredoi
Copy link
Contributor

thanks a lot @schlunma and @bouweandela 🍺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Units reported in metadata.yml may differ from actual units

4 participants